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\/  ^  Abstract 

When  people  use  language  spontaneously,  they  often  do  not  adhere  strictly  to  commonly  accepted 
standards  of  grammaticality.  The  primary  objective  of  this  project  is  to  develop  flexible  computer 
parsing  techniques  whfch  can  deal  with  the  various  kirnls  of  ungrammaticalities  that  arise,  both  on  the 
lexical  and  the  phrase  level. 

The  progress  towards  this  goal  covered  by  this  report  includes: 


•  The  completion  of  the  development  and  ttie  evaluation  of  CASPAR  and  DYPAR,  two 
experimental  parsers  based  on  the  construction-specific  approach  to  parsing,  this 
approach  having  been  formulated  through  experience  with  RexP,  a  flexible  parser 
developed  earlier  under  this  contract 


•  Further  development  of  the  construction-specific  approach  to  parsing  through  the  design 
and  construction  of  MULTiPAR.  Uke  CASPAR  and  DYPAR,  MULTIPAR  is  based  on 
construction-speciric  parsing  techniques,  but  aims  for  much  greater  linguistic  coverage, 
serving  as  a  vehicle  to  test  whether  the  a)nstruction-specific  approach  scales  up  in  a 
more  realistic  parser.  In  particular,  this  work  involved  development  of  additional 
donstruction-specific  parsing  strategies  and  of  a  control  structure  through  which  a  large 
number  of  such  strategies  could  be  coordlna^  on  the  parsing  of  a  single  input 

•  Additional  application  of  the  construction-specific  approach  to  flexible  parsing  to  the 
parsing  of  an  artificial  command  language  in  the  parser  for  the  Cousin  command 
interface,  a  graceful  interface  for  the  Unix  operating  system  being  developed  largely 
uTKier  other  funding.  This  effort  represents  a  parallel  track  of  development  and  proving 
ground  for  the  construction-specific  approach  to  parsing. 
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Wren  people  use  language  spmtaneously,  the;  often  do  not  adhere  strictly  to 
ooanonly  accepted  stwidards  of  gruMRaticalil^.  The  primary  objective  of  this 
project  is  to  develop  flexible  computer  parsa.ng  techniques  which  can  deal  with 
the  various  kinds  of  lakgraMmaticallties  t^at  arise,  both  on  4he  lexical  and 
ttie  phrase  level. 

The  progress  towards  this  goal  covered  by  this  report  includes: 

(1)  The  completion  of  the  development  and  the  evaluation  of  (CONTINUED) 
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ITEM  #20,  COitTINUED:  CASPAR  and  OYPAR,  two  experimental  parsers  based  on  .the 
eonstructlon-speci^c  cqpproach  to  parsingi  this  a|:^»^ach  having  been  formulated 
’Oirdug^  experience  with  FlexP,  a  flexible  parser  developed  earlier  under  this  | 
contract.  | 

(2)  Father  development  of  the  construction^specific  approach  to  parsing  ! 
Idiroutf)  the  design  and  construction  of  MULTIPAR.  Like  CASPAR  and  DYPAR^ 

NULTIPAR  is  based  on  construction-specific  parsing  techniques »  but  aims  for  much 
greater  linguistic  coverage^  serving  as  a  vehicle  to  test  whether  the 
constructiim-qiecific  qpproadi  scales  up  in  a  more  realistic  parser.  '  In  particu¬ 
lar,  this  work  involved  develofXMnt  of  additional  construction-specific  parsing 
strategies  and  of  a  control  structure  through  which  a  large  number  of  such 
strategies  could  be  coordinated  on  the  parsing  of  a  single  input. 

(3)  Additional  ^^lication  of  the  construction-specific  approach  to  flexible 
pcursing  to  the  parsing  of  an  artificial  command  language  in  the  parser  for  the 
COlSIlf  ccxmoand  Interface,  a  graceful  interface  for  the  Unix  operating  system 
being  developed  largely  cinder  other  funding.  This  effort  represents  a  parallel 
trade  of  development  and  proving  ground  for  the  construction-specific  approach 

to  parsing. 
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1.  Research  Objectives 

1.  When  people  use  language  spontaneously,  they  often  do  not  adhere  strictly  to  commonly 
accepted  standards  of  grammaticality.  The  primary  objective  of  this  project  is  to  develop 
flexible  computer  parsing  techniques  which  can  deal  with  the  \mriou8  kinds  of 
ungrammaticalities  that  arise,  both  on  the  lexical  and  the  phrase  level.  The  kinds  of 
ungrammaticaiity  we  wish  to  deal  with  include  at  the  lexical  level: 

^  •misspelt  words 

•  novel  words  whose  role  can  be  inferred  from  context 

•  erroneous  segmentation  between  words  (arising  from  the  omission  of  spaces,  or 
the  inclusion  of  spurious  spaces  or  punctuation) 

•  lexical  items  which  are  entered  in  one  form  and  then  changed  to  another 
and  at  the  phrase  level: 

•‘  input  which  is  broken  off  artd  then  restarted 

•  interjected  words  and  phrases 

•  ondtted  or  substituted  words  md  (Erases 

•  fragmentary  or  otherwise  elliptical  input 

•  agreement  taNuro 

•  Idioms 

2.  The  de^n  space  lor  parsers  Is  very  large.  We  aim  to  develop  a  set' of  design  choioee 
which  wW  result  in  parsers  well  suited  to  our  primary  goal.  The  design  choices  we  ara 
currently  using  are  listed  below. 

•  bottom>up  rather  than  top-down  parsing,  except  in  certain  situations  in  which  top* 
down  prediction  is  highly  constraining 

•  use  of  several  different  parsing  strategies,  each  tailored  to  a  particular  type  of 
construction,  and  selected  between  on  a  dynamic  basis 

•  provision  for  the  suspension  and  later  resumption  of  a  partial  parse  at  a  non* 
adjacentpart  of  the  Input  string 

3.  We  intend  to  develop  flexR)la  parsing  todtniques  in  the  context  of  interfaces  to  interactive 
computer  ^sterns.  We  are  working  with  two  types  of  interface  language: 

a.  lmited*domain  natural  languages.  i.e.  languages  vrith  ttte  syntax  of  (possibly  a 
aubaat  oO  natural  language,  but  whose  semantics  are  limited  to  those  of  the 
interactive  astern  being  interfaced  to.  air  roRci  orricx  ov  scimnFzc  RBSB/un?  r 

mOTlCS  OP  TRAJUSirirTAL  TO  DTIC 
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.  b.  mbre  restrictive  artificial  languages  of  the  sort  currently  found  in  computer 
interfaces 

Later,  vve  intend  to  investigate  how  easily  the  techniques  developed  for  these  kinds  of 
languages  can  be  transferred  to  more  gerterai  natural  language. 

4.  We  intend  to  investigate  formalisms  for  specifying  domain-dependent  grammars  in  a 
convenient  way  for  both  of  the  types  of  language  mentioned  above. 

y 

2.  Status  of  the  Research  Effort 


2.1 .  Overview 

The  work  covered  by  this  contract  from  its  start  in  1979  has  involved  several  distinct  phases: 

•  The  initial  development  of  the  FlexP  flexible  parser  for  restricted  domain  natural 
languages,  and  its  evaluation  in  a  gr^fuily  interacting  interface  to  an  electronic  mafl 
system,  covering  a  period  from  the  stail  of  the  contract  bi  July  1979  to  early  1981. 

•  Review  of  the  Initial  design  choices  for  FlexP  in  the  light  of  this  evaluation,  leading  to  the 
formulation  of  the  construction-apeciflc  approach  to  parsing,  and  its  preliminary 
evaluation  for  applied  natural  language  proc^ng  through  the  experimental  parsers 
CASPAR  and  DYPAR.  Thiscoveredtheperiodfromearly  1961  totheendoftheyesr. 

•  Further  development  of  the  oon8trucbon'^)ecific  approach  to  parsing  through  the  derign 
md  construction  of  MULTIPAR.  Like  CASPAR  and  DYPAR.  MULTIPAR  is  based  on 
consiniction-apecific  parsing  techniques,  but  aims  for  much  greater  linguistic  coverage. 

The  development  of  MULTIPAR  started  alt  the  beginning  of  1962  and  is  still  continuing. 

•  AddWonal  application  of  the  oonstrucdon-specilie  approach  to  flexible  parsing  to  the 
parsing  of  an  artificial  command  language  in  the  parser  for  the  Cousin  command 
interface,  a  graceful  interface  for  the  Unix  operating  system  being  developed  largely 
under  other  funding.  This  effort  started  in  mid-1981,  and  represents  a  parallel  track 
development  of  the  construction-specific  Ideas  mentioned  above. 

The  foMal  design  and  development  of  FlexP  and  the  lessons  learned  from  it  leading  to  the 
formulation  of  the  construction-specific  approach  to  parsing  have  been  descri^  in  the  two  previous 
annttti  reports  and  so  will  not  be  described  further  here.  Even  greater  detail  is  contained  in  the 
foNowing  publications:  [2, 3, 1, 4,  $}.  This  report  covers  only  the  period  July  1, 1981  to  June  30, 1982 
and  wM  therefore  confine  itself  to  the  following  topics: 

•  The  completion  of  the  development  and  the  evaluation  of  CASPAR  and  DYPAR. 

•  The  design  and  continuing  development  of  MULTIPAR. 

.  •  The  design  and  deveippment  of  the  flexible  parser  for  the  Cousin  user-f  riend|y  interface. 


Separate  sections  on  each  of  these  topics  follow. 


2.2.  Final  Development  and  Evaluation  of  CASPAR  and  DYPAR 

The  final  development  of  both  CASPAR  and  DYPAR  was  completed  in  a  way  faithful  to  their  initial 
de^ns  which  were  described  in  last  year’s  annual  report  and  in  greater  detail  in  [2, 3. 1, 4,  S].  To 
avoid  repetition,  this  report  will  confirte  itself  to  tiie  evaluations  and  conclusions  that  were  obts^ned 
from  our  mcperience  with  these  two  simple  parsers  after  first  reviewing  their  motivation  and  design 

If''"*** 

CASPAR  and  DYPAR  arose  out  of  our  mraluation  of  RexP  which  showed  that  uniform  parsing 
strategies  and  grammar  representations  had  significant  disadvantages  for  the  parsing  of 
ungrammatical  input.  Because  vnirsing  using  several  different  construction-specific  strategies  was  a 
novel  approach,  we  decided  to  try  out  the'  ideas  in  two  simplified  parsers,  CASPAR  and  DYPAR, . 
instead  of  trytiig  to  implement  a  fuH-scale  pai:»r  immediately.  CASPAR  was  designed  to  show  the 
suitabilify  of  construction-specific  tschniques  for  ungrammatical  input,  while  DYPAR  served  as  a 
vehicle  to  investigate  the  control  problems  of  cOordinatir^  several  distinct  parsing  strategies. 

The  conclusions  obtained  from  our  experience  with  CASPAR  and  DYPAR  can  be  summarized  as 
fotiowa: 

•  The  parsing  strategy  used  by  CASPAR  was  tailored  directly  to  imperative  case  frames. 

This  strategy  proved  highly  successful  in  dealing  with  ungrammatical  input,  much  more 
suocenfiil  than  the  uniform  strategy  employed  by  FlexP.  This  result  encouraged  us  to 
believe  that  the  degree  of  incisivimess  afforded  by  construction-specific  techniquea 
would  provide  similar  advantages  acroaa  a  wide  range  (rf  constructions. 

•  Besidaa  performing  weR  on  urtgrammatical  input,  CASPAR’S  construction-specific 
strategy  also  in  many  cases  performed  more  efficiently  than  the  uniform  strategy 
employ^  by  RexP  because  of  a  decrease  in  the  amount  of  searching  necessary.  This 
suggested  that  a  simNar  gain  in  efficterwy  could  be  obtained  through  similar  techniquea 
for  other  construction  types. , 

•  in  the  case  of  both  CASPAR  and  DYPAR,.the  coordination  of  the  various  construction- 
apedfic  strategies  was  implemented  through  the  code  of  the  parsers  themselves.  This 
would  ham  made  it  very  difficult  to  add  new  strategies  to  the  ones  already  there.  Amore 
flaxtela  method  for  the  coordination  of  multiple  strategies  is  clearly  necessary  to  pursue 
the  concept  of  multi-strategy  construction-specific  parsing,  since  any  parser  of  that  type 
with  a  wide  linguistic  coverage  would  in  practice  have  to  build  up  to  a  full  range  of 
strategies  fncrementaNy.  Only  with  very  small  parsers  like  CASPAR  and  DYPAR  is  it 
possible  to  think  of  all  the  required  strategies  in  advance  and  to  preprogram  their 
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•  As  we  have  shown  elsewhere  [4],  amUguities  that  cannot  be  resolved  by  a  flexible  parser 
should  be  resolved  in  a  user-friendly  system  through  a  tightly  focused  interaction  with  the 
person  who  provided  the  input.  Focused  interaction  requires  localized  representation  of 
ambiguity,  and  our  experience  with  CASPAR  suggests  that  this  is  easier  to  achieve  using 
a  construction-specific  rather  than  a  uniform  approach.  In  CASPAR,  we  analyzed  each 
construction  type  for  the  possible  types  of  ambiguity  it  can  give  rise  to,  devised 
representations  for  these  ambiguity  types,  and  constructed  the  parsing  techniques  so 
that  they  could  recognize  each  relevant  type  of  ambiguity  and  generate  the  appropriate 

X  representation.  Such  localized  ambiguity  representations  would  have  been  impossible  to 
construct  with  the  uniform  approach  of  RmcP. 

•  To  be  widely  applicable,  limited-domidn  parsers  must  provide  a  convenient  way  to  define 
lartguages  in  the  class  that  they  recognize.  The  construction-specific  approach  offers ' 
the  advantage  that  its  speciali^  parsing  techniques  can  operate  directly  from  such 
domdn-oriented  language  definitions  without  the  need  for  a  time-consuming  compilation 
phase  as  was  necessary  with  the  uniform  approach  of  RexP.  This  reasoning  rests  on  the 
assumption  that  the  most  convenient  way  to  express  the  language  from  the  application 
point  of  view  is  sufficiently  dose  to  the  "natural”  constructions  of  the  language  that 
direct  interpretation  is  pr^ble.  Our  ocperience  vrith  CASPAR  suggests  that  this 
assumption  is  valid. 

These  conclusions  listed  here  have  served  as  guiding  prindples  in  the  development  of  the 
MULTIPAR  parser  and  of  the  parser  for  the  Cousin  user-friendly  interface  as  described  in  the 
following  sections 

2.3.  MULTIPAR 

Given  the  largely  poeWve  experience  with  CASPAR  and  DVPAR,  we  embarked  around  the 
beginning  of  1S62  on  the  design  and  implementation  of  a  new  parser  tiial  we  call  MULTIPAR,  based 
on  the  same  consiructlon-spedfic  Ideas  as  CASPAR  and  DYPAR,  but  incorporating  many  mors 
construction  hfP^s.  We  viewed  MULTIPAR  as  a  vehicle  for  testing  whether  the  construction-specific 
approach  scaled  up  from  the  simple  pilot  parsers  already  constructed  to  a  parser  with  adequate 
coverage  for  a  realistic  natural  ianguage  interface. 

OYPAR  and  CASPAR  used  two  and  three  different  parsing  strategies  respectively,  coordination 
between  theee  strategies  was  simple  and  was  "hard-wired"  directly  into  the  control  structure  of  the 
paresie  themselves.  The  much  larger  number  of  strategies  needed  to  provide  adequate  linguistic 
coverage  and  the  need  to  make  the  additton  of  new  strategies  easy  precluded  this  "hard-wired" 
approach  for  MULTIPAR.  The  two  principal  initial  objectives  in  the  development  of  MULTIPAR 


•The  development  of  oonatruction-apecific  strategies  for  a  number  of  additional 
oanalnieMon  tvoaa. 
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•  K  contnH  structure  which  allowed  multiple  strategies  to  interact  together  without  their 
coordination  being  "hard-wired"  into  MULTIPAR. 

We  will  describe  progress  on  each  of  these  points  separately. 

2.3.1 .  Additional  construction-specifie  techniques 
in  CASPAR  and  DYPAR  we  concentrated  on  two  main  kinds  of  construction:  imperative  case 
fames  and  linear  patterns.  Clearly,  these  types  fall  far  short  of  covering  all  the  constructions 
common  in  restricted  domain  natural  languages,  so  to  develop  MULTIPAR,  it  was  necessary  to 
identify  and  devise  specific  paaing  techniques  for  those  constructions  commonly  found  in  such 
languages,  but  not  covered  by  our  existing  parsere.  Such  constructions  include:  noun  groups  (wWi 
determiners,  adjectives,  classifiers,  and  post-nominal  cases),  declarative  and  interrogative  case 
frames  (in  addition  to  the  impeative  ones  we  already  handle),  wh-questions,  relative  clause  modifiers, 
conjunction  at  the  noun  group  and  dauss  level  (general  use  of  conjunction  may  not  be  necessary  in  a 
restricted  ctomain  language),  and  comparatives.  We  have  compiled  these  construction  types  as  a 
minimai  set  necessary  for.  basically  habitabie  restricted-domain  natural  languages.  Other 
constructtons  may  have  to  be  included  iat^. 

For  each  of  these  construction  types,  the  following  steps  are  necessary: 

•  analyze  the  structure  of  the  construction,  paying  particular  attention  to  highly  restricted, 
or  othendss  easy  to  identify  components; 

•deviss  parsing  techniques  which  take  advantage  of  the  structure  to  parse  correct 
versions  of  the  oonsirucdon  correctly  and  efficiently; 

•  extend  these  techniques  to  recover  robustly  and  efficiently  from  situations  in  which  the 
construction  is  used  incorrectly  or  ungrammatically  wherever  such  recovery  is  possible. 

In  ksepkig  with  the  oonsiruction-apecific  approach,  ail  this  work  should  be  oriented  to  extracting  the 
maidmum  possible  leverage  from  characteristics  specific  to  the  individual  construction  types. 

By  the  end  of  the  contract  period  that  is  the  subject  of  this  report,  we  had  completed  these  steps  frxr 
some,  but  not  an  of  the  construction  types  Dated  above.  The  construction  types  covered  during  that 
period  include  noun  groups  (with  determinere,  adjectives,  and  classifiers),  declarative  and 
imerrogatfve  case  frames^  and  wh-questions. 
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2.3.2.  Control  structure 

'  As  noted  above.  CASPAR  and  DYPAR  used  two  and  three  different  parsing  strategies  respectively. 
Furthermore,  coordination  between  these  strategies  was  simple  and  was  "hard-wired"  directly  into 
the  control  structure  of  the  parsers  diemselves.  The  much  larger  number  of  strategies  needed  to 
provide  adequate  linguistic  coverage  and  d^e  need  to  make  the  addition  of  new  strategies  easy 
precluded  this  "hard-wired"  approach  for  MULTIPAR.  Instead,  we  required  a  control  structure  which 
allows,  large  numbers  of  strategies  to  cooperate  on  and  share  information  about  the  parsing  of  a  given 
input  Based  on  these  considerations  and  our  previous  experience  with  flexible  parsing,  we 
established  the  following  goals  for  the  control  structure  of  MULTIPAR: 

•  integration  of  a  large  number  of  highly  speciFic  and  specialized  parsing  strategies.  There 
may  well  be  several  strategies  applicable  in  any  given  situation. 

•  Ability  to  parse  bottom-up  from  the  best  information  available.  It  is  never  possible  to  rely 
absolutely  on  any  specific  piece  br  feature  of  a  construction  being  correct. 

•  As  much  top-down  control  as  possible.  While  bottom-up  parsing  is  necessary  to  form  an 
initial  hypothesis  about  what  the  structure  of  an  input  may  be,  it  is  inefficient  once  that 
hypothec  has  been  formed. 

•  Clean  separation  between  donudn  semantics  and  parsing  strategies.  This  is  most 
important  becaiMe  of  our  intention  to  apply  MULTIPAR  to  a  significant  number  of 
Afferent  domains. 

The  foRowing  design  for  MULTIPAR  covers  the  first  three  of  these  goals  and  neither  addresses  nor 
oontradicis  the  fourth.  The  control  structure  of  MULTIPAR  involves  the  following  three  kinds  of 


•  tashs:  A  task  represents  the  goal  of  recognizing  as  much  as  possible  of  a  given 
.  subsequence  of  the  input  as  a  certain  kind  of  grammatically  speciTied  object  (e.g.  a  task 

might  be  to  recognize  as  much  as  pt^ble  betvreen  the  second  and  seventh  words  of  "is 
the  price  of  a  rfisplay  terminid  more  than  a  hardcopy  terminal"  as  a  <comparable-obiect>, 
where  <comparabie-obiect>  was  a  grammaticai  subcategory*,  remember  we  are  working 
with  restricted-domain  language,  and  therefore,  semantic  type  grammars).  Such  tasks 
may  specify  that  the  recognition  is  to  be  left  or  right  anchored  if  the  whole  subsequence 
cmrnot  be  parsed  as  the  desired  object.  MULTIPAR  is  driven  at  the  top-level  by  a  task  to 
recognize  the  whole  of  an  input  line  as  a  grammatical  super-category,  which  includes  all 
complete  sentences  as  wen  as  individual  objects,  and  anything  else  the  system  being 
interlaced  to  is  prepared  to  interpret  in  isolation.  In  cases  where  elliptical  replies  are 
expected,  the  top-le^  task  might  be  to  recognize  an  object  of  the  type  expected. 

•  etrategiee:  A  strategy  is  a  method  for  recognizing  a  given  grammatical  constituent 
There  may  be  several  strategies  applicable  to  any  given  grammatical  category,  and  a 
given  etieligy  may  apply  to  more  than  one  type  of  constituent  Strategies  are  indexed  by 
grammatical  category.  Each  strmegy  has  a  simple  initial  test  based  on  pattern  matching 
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to  ch^k  applicability  to  a  specific  task  (i.e.  recognizing  a  given  constituent  in  a  given 
context  with  possible  left  or  right  anchoring),  plus  a  more  complicated  procedural  test  of 
applicability  to  be  applied  if  the  pattern  match  succeeds.  Each  strategy  has  an  indication 
of  the  amount  of  grammatical  deviation  it  is  designed  to  cope  with,  which  will  correspond 
roughly  to  the  amount  of  effort  needed  to  apply  it  Strategies  may  also  be  limited  to  left  or 
right  anchored  recognition. 

•  hypotheses:  An  hypothesis  is  the  result  of  applying  a  specific  strategy  to  a  specific  task 
and  constitutes  the  result  of  the  parsing  attempt,  thus  specified.  Hypotheses  are 
recorded  globally  in  a  blackboard-like  structure.  Both  successful  and  unsuccessful 
attempts  are  thus  recorded,  and  constitute  a  way  of  sharing  effort  between  different 
strategies.  The  successful  ones  are  analogous  to  (partial)  parse  trees. 

These  three  types  of  structure  work  together  as  follows: 

1 .  The  top-level  task  is  set  up  as  described  above. 

2.  Given  a  task,  all  strategies  whose  indexing  identifies  them  as  suitable  for  that  task  are 
identified,  and  grouped  according  to  degree  of  grammatical  deviation  handled. 

3.  The  strategies  are  applied  in  order  of  a^nding  ungrammaticality  until  one  succeeds.  All 
strategies  for  a  given  level  of  ungrammaticality  are  applied  (conceptually)  in  parallel. 

4.  Application  of  a  strategy  means  first  checking  for  a  precomputed  result  in  the  global 
blackboard  of  hypotheses,  then  applying  the  pattern-match  test,  then  the  procedural  test, 
and  then  if  that  succeeds,  the  bo<|y  of  the  strategy. 

5.  The  body  of  a  strategy  can  set  up  new  tasks,  and  the  strategy  as  a  whole  succeeds  if  the 
sub-tasks  succeed. 

6.  A  t^  succeeds  if  one  or  more  of  its  stralsgies  succeed. 

To  make  the  preceding  description  rather  more  concrete,  we  present  some  example  strategies,  aruf 
show  how  they  would  function  in  a  parsing  some  example  inputs.  The  linguistic  examples  are  drawn 
from  the  domain  of  a  computer  sales  assistant.  A  very  simple  MULTIPAR  strategy  is: 

StrategyName;  comparative-sentence 

Recognizes:  <complete-sentenc^ 

Pattern:  [<be>^<comparativ8>$Y] 

Body:  set  up  subtasks  of  recognizing  input  segments  represented  by  X  and  Y  as 

<comparable-obiect>8. 

Most  of  the.  work  in  this  strategy  is  done  by  the  simple  pattern-matching  rule  which  is  its  initial  test  To 
see  how  it  might  operate  consider  the  input 


-  ^  ^  e 
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Is.the  pribe  of  a  display  terminal  more  than  $100 

The  strategy  would  be  applicable,  and  would  isolate  "the  price  of  a  display  terminal"  as  X  and  "$100" 
as  Y.  The  two  subtasks  of  parsing  X  and  Y  as  <comparabie-ob)ect>s  would  then  be  established,  with 
the  first  being  parsed  by  a  strategy  which  recognized  constructions  of  the  "<attribute>  of  <object>" 
type,  and  the  second  which  recognized  strings  beginning  with  a  dollar  sign  and  followed  by  digits  as 
sums  of  money.  The  strategy  would  also  check  that  the  two  quantities  were  comparable  before 
r^K>rting  success,  trying  coercion  at  a  more  flexible  stage,  and  thus  making  sense  of  "is  a  display 
terminal  more  than  $100". 

A  more  complicated  strategy  is: 

imperative-casefrafne 

<complete-sentenc^ 

[<action<word>  $X]  (a  more  flexible  version  would  not  be  left  anchored) 

Obtain  the  case  frame  of  the  action  word.  Scan  the  input  segment  represented  by 
X  for  case  markers  from  tiiat  case  frame.  This  divides  X  up  into  a  number  of 
segments  separated  by  case  markers.  Set  up  tasks  to  recognize  objects  of  the 
type  indicated  by  the  preceding  marker  for  each  segment,  making  allowance  for 
direct  and  indirect  objects. 

This  second  strategy  is  very  similar  to  the  dominant  strategy  of  the  CASPAR  parser  mentioned  above. 
An  example  input  to  which  it  would  be  applicable  is: 
replace  the  display  terminal  with  a  teletype 

Here  "replace"  is  the  action  word  and  "with"  is  a  marker  from  its  case  frame.  This  isolates  "the 
display  terminal"  and  "a  teletype"  which  can  be  parsed  as  objects  of  the  appropriate  type,  in  this 
case  <component-set>a 

For  an  example  of  flexibility,  suppose  the  case  marker  "with"  is  missing,  so  ^at  the  two  component 
phrases  cannot  be  isolated.  The  strategy  then  sets  up  tasks  to  recognize  each  of  the  missing  case 
fillers  in  the  string  that  it  cannot  split  up.  Since  the  strategies  always  operate  to  recognize  as  much  of 
the  given  subsequence  as  possible  as  the  requested  category,  but  will  ignore  parts  that  they  cannot 
deal  with,. file  attempt  to  recognize  (in  left-anchored  mode)  a  component  in  "the  display  terminal  a 
teletype",  will  recognize  "the  display  terminal",  fail  to  recognize  "a  teletype",  but  isolate  it,  thus 
leading  to  its  recognition  on  the  second  attempt  to  parse  still  unrecognized  strings  as  the  fillers  of 
unfilled  case  frame  slots. 

Of  course,  there  is  also  no  guarantes,  given  the  many  roles  that  individual  prepositions  fill,  that  a 


StrategyName: 

Recognizes: 

Pattern: 

Body: 
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case  marker  that  is  found  is  really  a  case  marker  for  the  given  case  frame,  as  in: 

replace  the  display  terminai  with  a  teletype  with  a  paper-tape  reader 
Here  both  "with"s  are  found,  leading  to  two  different  ways  in  which  the  input  can  be  split  up  for 
further  parsing.  The  correct  reading  is  finally  preferred  because  it  accounts  for  more  of  the  input,  the 
strong  domain  constraints  making  it  easy  for  the  parser  to  refuse  to  accept  "the  display  terminal  with 
a  teletype"  as  a  <component-set>. 

Implementation  of  the  above  design  did  not  start  during  the  contract  period  that  is  the  subject  of 
the  present  report.  However,  at  the  time  of  writing,  implementation  has  begun. 

2.4.  Design  and  Development  of  the  Flexible  Parser  for  the  Cousin  Interface 

At  an  early  stage  in  the  development  of  the  multi-strategy,  construction-specific  approach  to 
parsing  restricted  domain  natural  language,-  it  became  apparent  to  us  that  a  similar  approach  coukJ 
be  used  to  parse  artificial  command  languages  as  well.  Accordingly,  starting  from  the  beginning  of 
the  contract  period  that  is  the  subject  of  the  present  report,  we  began  to  develop  a  flexible  parser 
based  on  this  approach  for  the  Cousin  interface  to  the  Unix  operating  systein,  which  we  are 
developing  under  other  funding,  and  which  uses  an  extended  version  of  the  standard  artificial  Unix 
command  language  for  input.  This  effoii  constituted  a  development  track  for  the  construction- 
specific  approach  parallel  to  that  represented  by  CASPAR  and  DYPAR  and  their  successor, 
MULTIPAR.  The  two  tracks,  however,  are  not  completely  independent,  since  several  of  the  specific 
techniques  developed  for  CASPAR  also  turned  out  to  be  useful  for  the  Cousin  parser,  as  the 
description  |n  the  remainder  of  this  section  will  show.  More  details  of  the  Cousin  system  and  its 
parser  can  be  found  in  [6]. 

The  command  language  for  Cousin  is  the  present  Unix  language,  minus  the  constructions  at  a 
level  higher  than  single  commands,  but  supplemented  by  other  language  features  that  make  it  easier 
for  the  user  to  specify  commands.  The  standard  Unix  format  for  command  lines  is: 

<command-name>  <options^  <argumenta> 

where  <options>  is  a  possibly  empty  sequence  of  flags,  single  characters  preceded  by  dashes,  and 
option  markers,  sdso  single  characters  preceded  by  dashes  which  identify  the  next  input  token  as  an 
optional  parameter.  The  <arguments>  are  a  Hxed  order  sequence  of  parameters  to  the  command  that 
are  not  identified  by  any  markers,  although  they  may  in  some  cases  be  optional.  An  example  is: 
cc  -w  -0  -0  bar  foo.c  fum.c 

which  is  a  call  to  the  C  language  compiler  (cc)  with  options  "w"  (suppression  of  warning  diagnostics) 
and  ”0”  (object  code  improvement),  a  flagged  option  "o"  (which  writes  output  to  the  file  named. 
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"bar"),  and  Mro  arguments  foo.c  and  fum.c,  die  files  to  be  compiled.  Conceptually,  cc  actually  has 
-one  argument,  the  file  to  be  compiled,  which  may  be  filled  an  arbitrary  number  of  times;  this  type  of 
argument  is  called  a  multiple  argument.  A  command  with  two  arguments  is  "cp”,  which  copies  a  list 
of  files,  its  first  argument,  into  a  directory,  its  second  argument,  as  in: 
cp  tilel  file2  dir 

y  Cousin  makes  two  extensions  to  the  standard  Unix  language:  the  addition  of  explicit  markers  for 

command  arguments  as  a  supplement  to  the  present  system  of  purely  positional  specification,  and 

the  addition  of  full  word  flags  and  markers  for  options  as  a  supplement  to  the  present  system  of  single 

characters  preceded  by  dashes.  So  the  aliove  examples  could  be  written  for  instance  as: 

cc  -O  no-warnings  foo.c  fum.c  output-to  bar 
cc  onto  dir  from  file  1  file2 

When  whole-word  markers  are  used,  the  ordering  restrictions  of  standard  Unix  are  relaxed.  Note  that 
this  extension  makes  the  language  similar  in  many  ways  to  the  kind  of  language  handled  by  CASPAR 
•  a  command  verb  followed  by  a  set  of  marked  cases.  The  major  differences  are  that  some  case 
markers  stand  by  themselves  and  have  no  fillers,  and  that  the  Unix  positional  syntax  is  still  included  in 
the  language.  This  similarity  is  exploited  by  some  of  the  flexible  parsing  techniques  described  below. 


The  multi-strategy  construction-specific  parsing  algorithm  that  we  have  so  far  developed  for  this 
language  is  as  follows: 

1.  Command  Identification:  In  much  the  same  way  as  CASPAR  finds  the  verb  of  its 
sentence,  the  Cousin  parser  determines  which  command  is  being  invoked,  and  locates 
the  syntax  description  -  positional  and  case  information  •  for  the  command. 

2.  Standard  Unix  parsing:  Using  this  syntax  information  the  remaining  part  of  the 
command  line  is  parsed  as  though  it  conformed  to  the  standard  Unix  syntax  for  that 
command,  taking  only  Unix  style  options  and  positionally  specified  arguments  into 
account.  If  this  step  is  successful,  parsing  is  complete,  and  no  attempt  Is  made  to  use  the 
case  style  syntax.  This  ensures  tttat  correct  Unix  commands  which  happen  by 
coincidence  to  use  case  marker  keywords  will  be  recognteed  correctly. 

3.  Extended  Unix  parsing:  If  the  standard  parse  is  unsuccessful  in  any  way,  the  next  step 
is  to  parse  die  line  according  to  the  extended  syntax.  The  procedure  here  is  the  CASPAR 
case  marker  scanning  algorithm,  modified  only  to  deal  with  case  markers  with  no 
corresponding  case  fillers;  i.e.,  a  scan  is  made  for  any  argument  marker  keywords,  or  any 
option  keywords,  and  the  arguments  and  options  thus  flagged  are  extrai^ed. 

4.  Flexible  Unix  parsing:  Otherwise,  if  any  of  the  input  string  is  still  not  accounted  for 
after  this  step,  a  rhore  flexible  algorithm  is  applied.  This  algorithm  is  designed  to  deal 
with  situations  in  which  the  user  has: 

•  used  a  mixture  of  marker  and  positional  notation 
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•  misspelt  input  tokens,  either  arguments  or  markers 

•  used  positional  notation  in  the  standard  Unix  style,  but  has  got  the  arguments  out 
of  order 

•  omitted  one  or  more  required  arguments 

•  used  standard  dash  notation  with  single  character  flags  and  markers  for  options, 
but  has  omitted  the  dash  or  put  the  option  string  other  than  at  the  beginning  of  the 
input 

Two  basic  techniques  are  involved  in  diis  flexible  style  of  parsing:  scanning  for  misspelt 
markers  and  options,  and  comparing  permutations  of  the  arguments  against  the  input 
tokens.  The  first  of  these  is  a  CASPAR  style  marker  scan,  with  the  possible  targets  for 
correct  spellings  restricted  to  be  markers  of  the  arguments  not  yet  filled.  The  second 
technique  is  specific  to  the  positional  style  of  construction  allow^  by  Unix,  and  is  kept 
combinatorially  tractable  by  the  fact  that  no  Unix  command  has  more  than  terse 
arguments. 

An  example  will  illustrate  how  this  algoritem  operates.  Suppose,  for  instance  that  the  user  types: 
op  onto  dir  form  tU2  tile3 
when  he  really  intended  to  type: 
cp  onto  dir  from  file2  tlloS 

Assume  that  "dir”  is  a  valid  directory  name,  "fiie2",  "filed"  are  valid  file  names,  but  "onto”,  "form", 
and  "from"  are  not  valid  files  or  directoites.  The  command  cp  has  two  arguments,  SOURCE  and 
DESTINATION.  SOURCE  is  a  multiple  argument  of  readable  files.  DESTINATION  is  an  ordinary 
argument  of  either  a  writable  directory,  or  a  creatable  file  (which  may  or  may  not  already  mcisQ.  There 
is  an  additiona]  restriction  that  if  DESTINATION  is  a  file,  SOURCE  may  contain  only  one  file.  The 
default  order  is  SOURCE  DESTINATION. 

Standard  Unix  syntax  does  not  work,  so  mctended  Unix  syntax  is  tried.  The  marker  scan  comes  up 
wHh  "onto",  and  "dir"  is  recognized  as  a  proper  DESTINATION,  and  there  are  Just  three  remaining 
arguments  which  could  be  assigned  to  SOURCE,  but  "form"  and  "fil2"  have  fcdled  matches  with 
SOURCE,  so  attended  Unix  syntax  does  not  work,  and  flexible  parsing  must  be  tried.  Note  that  if 
"form"  and  "fii2”  were  suitable  fites  for  SOURCE,  there  would  have  been  no  need  to  employ  the  extra 
flexteiiity.  The  first  fiexible  step  is  to  scan  for  misspelt  markers  from  left  to  right.  Extended  Unix 
syntax  has  already  accounted  for  "onto"  and  "dir",  so  the  scan  starts  from  "form",  which  is  of  course 
corrected  to  "from”.  Since  "from"  is  the  marker  for  SOURCE,  "fil2"  is  required  to  fill  the  SOURCE 
argument,  and  since  "files"  satisfies  the  restrictions  for  SOURCE,  and  since  SOURCE  is  a  multiple 
argument,  "fiieS"  also  is  taken  into  tee  SOURCE  argument  Since  "fi)2"  is  required  to  go  into 
SOURCE  the  fact  that  H  fidls  the  restrictions  on  the  argument  trigger  an  immediate  attempt  to  spelling 


correct  it.  This  attempt  succeeds,  and  the  par^  is  correct  and  complete,  without  it  being  necessary 
to  invoke  the  second  permutation  phase  of  flexibility. 


The  implementation  of  the  flexible  parsing  algorithm  described  in  this  section  was  compieted 
during  the  contract  period  covered  by  this  report,  and  incorporated  into  the  Cousin  user-friendly 
interface.  The  algorithm  has  proved  efficient  in  the  recognition  of  grammaticai  input,  and  robust  in  its 
handling  of  ungrammatical  input.  In  addition,  its  construction-specific  character  has  made  it  easy  to 
produce  the  localized  representations  of  ambiguity  in  its  output  which  are  so  important  for  graceful 
interaction  with  the  user  to  resolve  the  ambiguity  (see  [4]). 
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